Genome-wide analysis of sulfur-encoding biosynthetic genes in rice (Oryza sativa L.) with Arabidopsis as the sulfur-dependent model plant

Sulfur is an essential element required for plant growth and development, physiological processes and stress responses. Sulfur-encoding biosynthetic genes are involved in the primary sulfur assimilation pathway, regulating various mechanisms at the gene, cellular and system levels, and in the biosynthesis of sulfur-containing compounds (SCCs). In this study, the SCC-encoding biosynthetic genes in rice were identified using a sulfur-dependent model plant, the Arabidopsis. A total of 139 AtSCC from Arabidopsis were used as reference sequences in search of putative rice SCCs. At similarity index > 30%, the similarity search against Arabidopsis SCC query sequences identified 665 putative OsSCC genes in rice. The gene synteny analysis showed a total of 477 syntenic gene pairs comprised of 89 AtSCC and 265 OsSCC biosynthetic genes in Arabidopsis and rice, respectively. Phylogenetic tree of the collated (AtSCCs and OsSCCs) SCC-encoding biosynthetic genes were divided into 11 different clades of various sizes comprised of branches of subclades. In clade 1, nearing equal representation of OsSCC and AtSCC biosynthetic genes imply the most ancestral lineage. A total of 25 candidate Arabidopsis SCC homologs were identified in rice. The gene ontology enrichment analysis showed that the rice-Arabidopsis SCC homologs were significantly enriched in the following terms at false discovery rate (FDR) < 0.05: (i) biological process; sulfur compound metabolic process and organic acid metabolic processes, (ii) molecular function; oxidoreductase activity, acting on paired donors with incorporation or reduction of molecular oxygen and (iii) KEGG pathway; metabolic pathways and biosynthesis of secondary metabolites. At less than five duplicated blocks of separation, no tandem duplications were observed among the SCC biosynthetic genes distributed in rice chromosomes. The comprehensive rice SCC gene description entailing syntenic events with Arabidopsis, motif distribution and chromosomal mapping of the present findings offer a foundation for rice SCC gene functional studies and advanced strategic rice breeding.

Gene ontology (GO) and KEGG pathway enrichment of SCC-encoding biosynthetic genes. The GO and pathway enrichment analysis of rice and Arabidopsis SCC-encoding biosynthetic genes revealed a total of 206, 149 and 37 hits (terms) in biological process (BP), molecular function (MF) and KEGG pathway, respectively. The number of hit terms, commonly enriched among the rice and Arabidopsis SCC-encoding biosynthetic genes are as follows: BP; 30, MF; 34 and KEGG pathway; 9. In BP, the most significantly enriched terms among the rice SCC-encoding biosynthetic genes are sulfation, hormone biosynthetic process and hormone metabolic process whereas, in Arabidopsis SCC biosynthetic genes, the following terms were significantly enriched: (i) S-glycoside metabolic process, glycosinolate metabolite process and glucosinolate metabolic processes. In both the rice and Arabidopsis SCC biosynthetic genes, sulfur compound metabolic process and organic acid metabolic process were commonly present.
In MF, oxidoreductase activity, acting on paired donors with incorporation or reduction of molecular oxygen was the most significantly enriched (with more than 180 hits) term in both rice and Arabidopsis SCC biosynthetic genes. Other terms enriched at a relatively high extent are as follow: (i) oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, NAD(P)H as one donor and incorporation, (ii) monooxygenase activity, (iii) iron ion binding, (iv) heme binding, (v) tetrapyrrole binding, (vi) metal ion binding and (vii) N, N-dimethylalanine monooxygenase activity.
The KEGG pathway enrichment showed involvement of the rice-Arabidopsis homologous genes in 10 different signalling pathways. The highest number of genes were significantly enriched in the metabolic pathways and biosynthesis of secondary metabolites with a total number of genes of 80 and 67, respectively. The tryptophan metabolism and 2-oxocarboxylic acid metabolism were fairly high at 25 and 21, respectively (Fig. 6).

Chromosomal distributions of the SCC biosynthetic genes in A. thaliana and O. sativa. Highly
conserved SCC biosynthetic genes were physically mapped on the Arabidopsis and rice genomes. The SCC biosynthetic gene distribution in Arabidopsis and rice chromosomes are unequal (Fig. 5). In Arabidopsis, chromosome 1 showed the highest gene number (GN) = 10, followed by chromosome 5 (GN = 8), chromosome 2, (GN = 5), chromosome 3 (GN = 3) and chromosome 4 (GN = 1). The rice SCC biosynthetic genes are distributed in all the 12 chromosomes except chromosomes 5 and 7. Chromosomes 1, 3, 4, 6, 8, 10, 11 and 12 contain one to three OsSCC biosynthetic genes, and the highest number of OsSCC biosynthetic genes (GN = 4) are distributed on chromosomes 2 and 9. No tandem duplications are observed among the SCC biosynthetic genes; no two gene loci are arranged in close proximity and genes are separated by more than five duplicated blocks (Fig. 5). The OsIPMS1 and OsIPMS2 encoding proteins have the longest protein length (635 aa) in rice and AtCYP79B3 (543 aa) in Arabidopsis. OsIAGLU and AtIPMI2 are the shortest protein-encoding gene in rice (113 aa) and Arabidopsis (256 aa), respectively. More than half of the proteins encoded by the SCC biosynthetic genes are acidic, with a theoretical pI (isoelectric point) ranging from 4.63 to 6.24 (Arabidopsis) and 5.1 to 6.8 (rice). The average molecular weight (MW) of AtSCC biosynthetic genes is 45.94 kDa and 43.24 kDa for the OsSCC biosynthetic genes ( Table 2).

Discussion
Sulfur (S) is a secondary macronutrient that regulates plant physiology, growth and developmental processes such as photosynthesis, biosynthesis of sulfur-containing compounds (SCCs) and hormone biosynthesis. It is the 4th major nutrient for crop production after nitrogen, phosphorus and potassium. In higher plants, the S acquisition and assimilation consumes high energy. The S element is taken up by plants as sulphate ions mainly via roots and a small amount can be absorbed through leaves. In rice, the S element, S-containing genes and associated SCCs are critically involved in stress-responsive mechanisms 45 .
For example, the glutathione S-transferase (GST), a detoxification enzyme ubiquitously present in vertebrates and invertebrates plays an important role in xenobiotic compound detoxification. GST activity is associated with oxidative stress protection as it acts as a mediating substrate in various biochemical reactions, interacts with phytohormones and redox metabolites, and coordinates stress-induced signalling events 10 . Glutathione (GSH) mediates abiotic and biotic stress resistance using the ROS-scavenging mechanism of the first defense line system  46 . Extensive studies have evident GSH-mediated tolerance mechanisms against salinity, drought, heavy metal toxicity, chilling and herbicides in rice, wheat, barley, soybean and canola 47 . The effect of S www.nature.com/scientificreports/ amendment on plant defense response had contributed to similar evidence. As such, the soil amendment of S-containing fertilizer on wheat varieties increased resistance against brown rust and improved the overall productivity 48 . Rice yield-impeding factors include pest and pathogen, climate, weather, soil infertility, heavy metal contamination and others. Presently, rice yield enhancement strategies are vigorously carried out by tapping into various aspects of rice biology. Genetic studies, molecular breeding, genetic engineering, heterosis breeding and population improvement are amongst the most sought-after tools utilized in modern rice breeding [49][50][51] . Since a large number of studies on rice S and SCCs have been linked to stress mechanisms and defense responses, a comprehensive annotation of SCC-encoding genes in the rice genome is important to necessitate enhanced manipulation strategies in breeding approaches [52][53][54][55][56] .
In this study, a total of 665 OsSCC biosynthetic genes were identified as the homologs of AtSCC query sequences. A total of 477 syntenic gene pairs (Arabidopsis-rice) and 25 rice SCC biosynthetic genes (AtSCC homologs) were obtained using a comprehensive analysis entailing synteny, phylogenetic, conserved motif distribution and gene structure. The synteny analysis identified the gene order and compared the genomic structural changes of the target genes. Shared synteny assumes a common ancestor/evolutionary origin and a syntenic fragment shares a similar function 57,58 . A small number of genes identified as Arabidopsis-rice syntenies , suggests the early Angiosperm divergence of monophyletic monocot from its eudicot relatives 59 . The monocot rice genome with 5 chromosomes typically diverged from the eudicot Arabidopsis genome (7 chromosomes) of a higher Table 1. Mining for Oryza sativa sulfur-encoding biosynthetic genes (OsSCC) with Arabidopsis sulfurencoding biosynthetic gene (AtSCC) input data. Selection criteria are described as following: (1) synteny events; (2) phylogenetic clade; (3) motif composition (Os/At); and (4) number of exon (EN) with AtSCC biosynthetic genes (Os/At). www.nature.com/scientificreports/ chromosome number 60 . The synteny analysis of Arabidopsis-rice SCC biosynthetic genes implies the ancient existence of SCC biosynthetic genes, even before the divergence of the Arabidopsis-rice (eudicot-monocot). The SCC biosynthetic gene distribution pattern suggests the occurrence of an expansion event during evolution which could have possibly gone through gene co-localization or inter-chromosomal translocation 61 . The phylogenetic and gene structure pattern of the SCC-encoding biosynthetic genes suggest exon loss and gain events during Arabidopsis-rice (eudicot-monocot) evolution. The exon-intron arrangement pattern in 25 AtSCC and 18 OsSCC suggests that the species-specific genome features are conserved 62 . The mosaic patterning of the SCC gene exon-intron regions could be associated with evolutionary forces that shaped the SCC biosynthetic gene structure dynamics.

Selection of OsSCC biosynthetic genes
Motifs are frequently occurring (conserved) regions within a DNA sequence. Found within the regulatory regions such as promoters and 3î UTRs, the 4-10 base pair motifs carry significant genome regulatory functions. Two species are likely to be close relatives if they share a high content of common motifs 63 . During speciation, mutations lead to either an accumulation or loss of motifs (motif turnover) and thus, a motif content analysis is often regarded as more advantageous than the counterpart sequence similarity search analysis. Our results showed that at least 10 different motifs identified in the Arabidopsis and rice SCC-encoding biosynthetic genes have similar distribution patterns by clades. The OsSCC biosynthetic genes identified in this study showed potential functional roles in plant defense response. In clade 1, LOC_Os02g42330 (nitrilase 1), the syntenic pair of At3g44300 (nitrilase 2) was reported to participate in the tryptophan-dependent pathway of auxin biosynthesis in rice 64 . Three OsSCC biosynthetic genes from clade 10 were characterized as O-methyltransferase, a key gene in Arabidopsis indolic glucosinolate modification. As shown in Table 1, five -glucosidase genes from clade 4 showed syntenies with glucosidase 34 www.nature.com/scientificreports/ (AtBGLU34). AtBGLU34 plays a major role in response to salt stress 65 and indolic glucosinolate biosynthesis 66 in Arabidopsis.
The SCC biosynthetic genes distributed among the unique phylogenetic clades, carrying similar motif pattern are possibly sharing a similar function. The unique motifs in each clade could be associated with specific functional roles of the SCC biosynthetic genes. The current findings shed insights on the potential functional roles of SCC biosynthetic genes in rice as more than half of the genes were putatively involved in the biosynthesis of aliphatic glucosinolate and indolic glucosinolate. Based on the gene ontology and pathway enrichment analysis, the Arabidopsis-rice homologous SCC-encoding genes were significantly enriched in the sulfur compound metabolic process (BP), oxidoreductase activity, acting on paired donors with incorporation or reduction of molecular oxygen (MF) and biosynthesis of secondary metabolites (KEGG pathway) (Fig. 6). This may suggest the role of Figure 6. Gene ontology (GO) and pathway enrichment analysis. The bubble plot represents the top 20 significantly enriched terms of the Arabidopsis-rice homologous SCC-encoding geens. The GO terms are presented in (i-ii) biological process and (iii-iv) molecular functions whereas the KEGG pathways are presented in (v-vii). Red arrows represent the terms shared among the Arabidopsis-rice orthologous genes. The results are visualized at P < 0.05 using ShinyGO v0.75 (http:// bioin forma tics. sdsta te. edu/ go75/). www.nature.com/scientificreports/ the SCC-encoding genes in S assimilation, whereby the reduction of sulphate ion to sulphide and subsequent S-containing amino acids (methionine and cysteine) via the adenosine phosphosulphate pyrophosphate (APS) and phosphoadenosine phosphosulphate (PAPS) is catalyzed by the participating enzyme activities.
In plant breeding strategies, exploiting the naturally occurring genetic variation is of utmost fundamental in controlling genes of agronomic importance. Physical maps of rice SCC biosynthetic genes provided in this study could be harnessed for chromosomal region manipulated breeding techniques such as the target chromosomesegment substitution 67 and hotspot chromosomal regional positioning of desirable candidate genes 68 . The findings enable the selection of desirable target rice genes which are tightly linked to S and SCC-encoding genes with a putative functional role in stress response mechanisms.

Conclusions
Rice SCCs biosynthetic genes show syntenic associations with Arabidopsis homologs (AtSCCs). The high degree of conservation between the AtSCC and OsSCC genes suggests long conservation history which could be implicated in SCC gene functions in plant defense response. The present findings not only identified the rice SCCencoding genes (OsSCC) but also stretch further to include chromosomal level-mapping to better inform new directions in rice functional research and breeding manipulation strategies.